Skip to content

Conversation

@sh1ng
Copy link
Owner

@sh1ng sh1ng commented Mar 13, 2024

--request-rate --> 12 12 vllm-project#2357 reorder-window=5 12 reorder-window 5 --scheduler-swap-tolerance 1024 12 reorder-window 30 --scheduler-swap-tolerance 1024
Request throughput(requests/s): 2.40 2.6 2.63 2.58
Input token throughput(tokens/s): 596 646 652 640
Output token throughput(tokens/s): 580 629 634 623
Mean TTFT(ms): 137951 121110 91733 80507
Median TTFT(ms): 136089 118017 92819 91663
P99 TTFT(ms): 299581 269182 196019 197222
Mean TPOT(ms): 3624 3258 2895 2916
Median TPOT(ms): 754 683 767 722
P99 TPOT(ms): 30423 26849 22905 22320

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants